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WHAT IS CLAIMED IS: 

A document: "type definition generating method, 
comprising, in a structured document provided with a 
tag having an element name in each document element: 

a physical structure judging step of judging a 
physical structure of each document element; 

a semantic structure judging step of judging a 
semantic structure of said each document element; and 
a document type definition generating step of 
10 generating dpcument type definition to define 

appearance st^ate of the document element in said 
structured dodument based on judgment results of said 
physical structure judging step and said semantic 
structure judging step, 

15 

2. The dociiment type definition generating method 
according to claim 1, wherein said physical structure 
judging step comprises judging the physical structure 
of the document element based on an indention or a 
20 blank line. 



25 



3. The document \type definition generating method 
according to claim 2, wherein when the physical 
structure of the document element is judged based on 
said indention, the judaing is performed by excluding 
the indention which represents quotation. 
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4 . The 
according to 



document "type definition generating method 
nlaim 2, wherein when the physical 



structure of The document element is judged based on 
said blank line, the judging is performed by excluding 
the blank linel from a document in which description is 
made by constantly placing every predetermined number 
of blank lines, 



5. The ddjpument type definition generating method 
10 according to clhim 1, wherein said physical structure 
judging step comprises judging the physical structure 
of the document jelement based on a positional relation 
of the tags surrounding the document element. 

15 6. The document type definition generating method 

according to claim 1, wherein said semantic structure 
judging step comprises referring to a semantic 
information database to judge the semantic structure of 
the document element based on words and phrases 

20 connection in a document and word types. 



25 



7. The document type definition generating method 
according to claim J\, wherein said semantic structure 
judging step comprises judging the semantic structure 
of the document element based on a meaning represented 
by the tags surrounding the document element. 
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8. The docunent type definition generating method 



according to claiin 



definition generating step comprises a redundancy 



removing step of, 
semantic structure 



type and excluding 



1, wherein said document type 



when the physical structure and the 
of a plurality of document elements 
having the tags different in element name are similar, 
regarding the docunent elements as being of the same 

one element name from a document 
type definition generating object based on the judgment 
results of said physical structure judging step and 
said semantic structure judging step, 

9. The document! type definition generating method 
according to claim 8,1 wherein said redundancy removing 
step comprises obtaining similarity degrees concerning 
agreement degrees of tlhe physical structure and the 
semantic structure between the document elements having 



the tags different in 



element name, and regarding the 



document elements as b^ing of the same type when a 



general similarity val 
degrees is equal to or 
threshold value. 



ae calculated from the similarity 
more than a predetermined 



10. The document type definition generating 
method according to claim 1, wherein said document type 
definition generating step comprises a title changing 
step of, when the physical structure and the semantic 
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structure of p plurality of document elements having 
the tags withl the same element name are different, 
regarding the Idocument elements as being of different 
types and changing one element name based on the 
judgment results of said physical structure judging 
step and said semantic structure judging step. 

11. The document type definition generating 
method according to claim 1, wherein said document type 
definition generating step comprises analyzing words 
and phrases present between a start tag and an end tag 
having the same title, obtaining information to be 
included between t|he tags, and generating the document 
type definition based on the information. 

12. A document type definition generating 
apparatus comprising: in a structured document provided 
with a tag having an\ element name in each document 
element, \ 

physical structure judging means for judging a 
physical structure of \said each document element; 

semantic structure judging means for judging a 
semantic structure of said each document element; and 

document type defirktion generating means for 
generating document type \def inition to define 
appearance state of the document element in said 
structured document based pn judgment results of said 



• # 
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physical structure judging means and said semantic 
structure judging means . 

13. The document type definition generating 
5 apparatus according to claim 12, wherein said physical 
structure judging means judges the physical structure 
of the document Element based on an indention or a 
blank line. 

10 14. The docuirtent type definition generating 

apparatus according Y° claim 13, wherein said physical 
structure judging meaVis judges the physical structure 
of the document elemeryfc based on said indention by 
excluding the indent ion\ which represents quotation. 

15 

15. The document ty^pe definition generating 
apparatus according to cla\im 13, wherein said physical 
structure judging means judges the physical structure 
of the document element basefi on said blank lines by 
20 excluding the blank lines from a document in which 
description is made by constantly placing every 
predetermined number of blank lines. 



16. The document type definition generating 
25 apparatus according to claim 12, wherein said physical 
structure judging means judges the physical structure 
of the document element based on a positional relation 
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of the tags surrounding the document: element. 

17. The! document type definition generating 
apparatus according to claim 12, wherein said semantic 
structure judging means refers to a semantic 
information database to judge the semantic structure of 
the document element based on words and phrases 
connection in a \ document and word types. 

18. The document type definition generating 
apparatus accordirra to claim 12, wherein said semantic 
structure judging means judges the semantic structure 
of the document element based on a meaning represented 
by the tags surrounding the document element. 

19. The document \ type definition generating 
apparatus according to claim 12, wherein said document 
type definition generating means comprises redundancy 
removing means for, when the physical structure and the 
semantic structure of a plurality of document elements 
having the tags different im element name are similar, 
regarding the document elements as being of the same 
type and excluding one element name from a document 
type definition generating obYject based on the judgment 
results of said physical structure judging means and 
said semantic structure judging means. 
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20. The document type definition generating 
apparatus according to claim 19, wherein said 
redundancy removing means obtains similarity degrees 
concerning agreement degrees of the physical structure 
and the semantic structure between the document 
elements having the tags different in element name, and 
regards the document elements as being of the same type 
when a general similarity value calculated from the 
similarity degrees is equal to or more than a 
predetermined {threshold value. 



21. The document type definition generating 
apparatus according to claim 12, wherein said document 
type definition Igenerating means comprises title 
changing means for, when the physical structure and the 
semantic structure of a plurality of document elements 
having the tags wtlth the same element name are 
different, regarding the document elements as being of 
different types arm changing one element name based on 
the judgment results of said physical structure judging 
means and said semantic structure judging means. 



22. The document type definition generating 
apparatus according \to claim 12, wherein said document 
type definition generating means analyzes words and 
phrases present between a start tag and an end tag 
having the same title; obtains information to be 
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included between the /tags, and generates the document 
type definition based on the information. 



WW? 
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23. A computer-readable storage medium storing a 
document type definition generating program for 
controlling a computer to perform document type 
definition generation, said program comprising codes 
for causing the domputer to perform: 

in a structured document provided with a tag 
having an element name in each document element, a 
physical structure judging step of judging a physical 
structure of eaqh document element; 

a semantic (structure judging step of judging a 
semantic structure of said each document element; and 

a document! type definition generating step of 
generating document type definition to define 
appearance staipe of the document element in said 



structured doc 



physical structure judging step and said semantic 



structure judg 



A 



iraent based on judgment results of said 



ing step. 



